23 research outputs found

    Weighted linear fusion of multimodal data - a reasonable baseline?

    Get PDF
    The ever-increasing demand for reliable inference capable of handling unpredictable challenges of practical application in the real world, has made research on information fusion of major importance. There are few fields of application and research where this is more evident than in the sphere of multimedia which by its very nature inherently involves the use of multiple modalities, be it for learning, prediction, or human-computer interaction, say. In the development of the most common type, score-level fusion algorithms,it is virtually without an exception desirable to have as a reference starting point a simple and universally sound baseline benchmark which newly developed approaches can be compared to. One of the most pervasively used methods is that of weighted linear fusion.It has cemented itself as the default off-the-shelf baseline owing to its simplicity of implementation, interpretability, and surprisingly competitive performance across a wide range of application domains and information source types. In this paper I argue that despite this track record, weighted linear fusion is not a good baseline on the grounds that there is an equally simple and interpretable alternative – namely quadratic mean-based fusion – which is theoretically more principled and which is more successful in practice. I argue the former from first principles and demonstrate the latter using a series of experiments on a diverse set of fusion problems: computer vision-based object recognition, arrhythmia detection, and fatality prediction in motor vehicle accidents.Postprin

    GhostVLAD for set-based face recognition

    Full text link
    The objective of this paper is to learn a compact representation of image sets for template-based face recognition. We make the following contributions: first, we propose a network architecture which aggregates and embeds the face descriptors produced by deep convolutional neural networks into a compact fixed-length representation. This compact representation requires minimal memory storage and enables efficient similarity computation. Second, we propose a novel GhostVLAD layer that includes {\em ghost clusters}, that do not contribute to the aggregation. We show that a quality weighting on the input faces emerges automatically such that informative images contribute more than those with low quality, and that the ghost clusters enhance the network's ability to deal with poor quality images. Third, we explore how input feature dimension, number of clusters and different training techniques affect the recognition performance. Given this analysis, we train a network that far exceeds the state-of-the-art on the IJB-B face recognition dataset. This is currently one of the most challenging public benchmarks, and we surpass the state-of-the-art on both the identification and verification protocols.Comment: Accepted by ACCV 201

    Ancient Roman coin retrieval : a systematic examination of the effects of coin grade

    Get PDF
    Ancient coins are historical artefacts of great significance which attract the interest of scholars, and a large and growing number of amateur collectors. Computer vision based analysis and retrieval of ancient coins holds much promise in this realm, and has been the subject of an increasing amount of research. The present work is in great part motivated by the lack of systematic evaluation of the existing methods in the context of coin grade which is one of the key challenges both to humans and automatic methods. We describe a series of methods – some being adopted from previous work and others as extensions thereof – and perform the first thorough analysis to date.Postprin

    NHS underfunding and the lopsided socialized model

    No full text

    Face recognition from video

    No full text
    In spite of over two decades of intense research, illumination and pose invariance remain prohibitively challenging aspects of face recognition for most practical applications. The objective of this work is to recognize faces using video sequences both for training and recognition input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have a wide variability and face images are of low resolution. In particular there are three areas of novelty: (i) we show how a photometric model of image formation can be combined with a statistical model of generic face appearance variation, learnt offline, to generalize in the presence of extreme illumination changes; (ii) we use the smoothness of geodesically local appearance manifold structure and a robust same-identity likelihood to achieve invariance to unseen head poses; and (iii) we introduce an accurate video sequence "reillumination" algorithm to achieve robustness to face motion patterns in video. We describe a fully automatic recognition system based on the proposed method and an extensive evaluation on 171 individuals and over 1300 video sequences with extreme illumination, pose and head motion variation. On this challenging data set our system consistently demonstrated a nearly perfect recognition rate (over 99.7%), significantly outperforming state-of-the-art commercial software and methods from the literature. © Springer-Verlag Berlin Heidelberg 2006

    Achieving robust face recognition from video by combining a weak photometric model and a learnt generic face invariant

    No full text
    In spite of over two decades of intense research, illumination and pose invariance remain prohibitively challenging aspects of face recognition for most practical applications. The objective of this work is to recognize faces using video sequences both for training and recognition input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have a wide variability and face images are of low resolution. The central contribution is an illumination invariant, which we show to be suitable for recognition from video of loosely constrained head motion. In particular there are three contributions: (i) we show how a photometric model of image formation can be combined with a statistical model of generic face appearance variation to exploit the proposed invariant and generalize in the presence of extreme illumination changes; (ii) we introduce a video sequence re-illumination algorithm to achieve fine alignment of two video sequences; and (iii) we use the smoothness of geodesically local appearance manifold structure and a robust same-identity likelihood to achieve robustness to unseen head poses. We describe a fully automatic recognition system based on the proposed method and an extensive evaluation on 323 individuals and 1474 video sequences with extreme illumination, pose and head motion variation. Our system consistently achieved a nearly perfect recognition rate (over 99.7% on all four databases). © 2012 Elsevier Ltd All rights reserved

    A more principled use of the p

    No full text

    Discriminative extended canonical correlation analysis for pattern set matching

    Full text link
    In this paper we address the problem of matching sets of vectors embedded in the same input space. We propose an approach which is motivated by canonical correlation analysis (CCA), a statistical technique which has proven successful in a wide variety of pattern recognition problems. Like CCA when applied to the matching of sets, our extended canonical correlation analysis (E-CCA) aims to extract the most similar modes of variability within two sets. Our first major contribution is the formulation of a principled framework for robust inference of such modes from data in the presence of uncertainty associated with noise and sampling randomness. E-CCA retains the efficiency and closed form computability of CCA, but unlike it, does not possess free parameters which cannot be inferred directly from data (inherent data dimensionality, and the number of canonical correlations used for set similarity computation). Our second major contribution is to show that in contrast to CCA, E-CCA is readily adapted to match sets in a discriminative learning scheme which we call discriminative extended canonical correlation analysis (DE-CCA). Theoretical contributions of this paper are followed by an empirical evaluation of its premises on the task of face recognition from sets of rasterized appearance images. The results demonstrate that our approach, E-CCA, already outperforms both CCA and its quasi-discriminative counterpart constrained CCA (C-CCA), for all values of their free parameters. An even greater improvement is achieved with the discriminative variant, DE-CCA.Comment: Machine Learning, 201

    A unified framework for thermal face recognition

    Full text link
    The reduction of the cost of infrared (IR) cameras in recent years has made IR imaging a highly viable modality for face recognition in practice. A particularly attractive advantage of IR-based over conventional, visible spectrumbased face recognition stems from its invariance to visible illumination. In this paper we argue that the main limitation of previous work on face recognition using IR lies in its ad hoc approach to treating different nuisance factors which affect appearance, prohibiting a unified approach that is capable of handling concurrent changes in multiple (or indeed all) major extrinsic sources of variability, which is needed in practice. We describe the first approach that attempts to achieve this – the framework we propose achieves outstanding recognition performance in the presence of variable (i) pose, (ii) facial expression, (iii) physiological state, (iv) partial occlusion due to eye-wear, and (v) quasi-occlusion due to facial hair growth

    GhostVLAD for set-based face recognition

    No full text
    The objective of this paper is to learn a compact representation of image sets for template-based face recognition. We make the following contributions: first, we propose a network architecture which aggregates and embeds the face descriptors produced by deep convolutional neural networks into a compact fixed-length representation. This compact representation requires minimal memory storage and enables efficient similarity computation. Second, we propose a novel GhostVLAD layer that includes ghost clusters, that do not contribute to the aggregation. We show that a quality weighting on the input faces emerges automatically such that informative images contribute more than those with low quality, and that the ghost clusters enhance the network’s ability to deal with poor quality images. Third, we explore how input feature dimension, number of clusters and different training techniques affect the recognition performance. Given this analysis, we train a network that far exceeds the state-of-the-art on the IJB-B face recognition dataset. This is currently one of the most challenging public benchmarks, and we surpass the state-of-the-art on both the identification and verification protocols
    corecore